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Abstract 

Based on molecular modeling techniques we constructed a rational 3D model of ORF3 in SARS coronavirus (SARS-CoV). Our studies 
suggest that the function of ORF3 could be involved in FAD/NAD binding according to its predicted structure and comparison with other 
structure neighbors. Furthermore, we identified three pairs of non-canonical N-FF • -tu interactions in the structure of ORF3, which can make 
contributions to the stability of protein structure. These results provide important clues for better understanding of SARS-CoV ORF3 and 
trying new therapeutic strategies. 

© 2004 Elsevier B.V. All rights reserved. 
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1. Introduction 

Sequence analysis of SARS coronavirus genome reveals 
that it contains five major open reading frame (ORFs) that 
encode a polymerase, and S, M, E and N proteins like those 
of other coronavirus. However, the nine potential ORFs are 
not found in other coronaviruses [1]. Theoretically all these 
proteins can be used as targets in drug and vaccine design. 
However, there are some difficulties in understanding the 
functions of these unknown proteins due to very poor 
sequence homology with proteins available in the Protein 
Data Bank. 

The 3D jury system [2] utilizes a global network of 
independent structure prediction servers to detect patterns 
of structural similarity between diverse models and select 
the correct fold from a set of borderline predictions. An 
exciting finding based on such a method [3] is that the 
mRNA cap-1 methyltransferase function has been 
assigned to the nspl3 protein of the SARS coronavirus 
(3D jury score > 100). In this study, we started with meta¬ 
server 3D jury system for fold recognition study of ORF3, 
constructed rational molecular model, hence to understand 
the potential function of ORF3 in terms of tertiary 
structure. 
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2. Materials and methods 

The sequence of ORF3 protein in SARS-CoV was 
downloaded from GenBank (NP_828851) and used for fold 
prediction by 3D Jury system [2], it is a comprehensive 
protein structure prediction servers including more than 10 
novel fold recognition methods, which made a dramatic 
impact on the critical assessment of protein structure 
prediction (CASP-5) in 2002. The proteins with a sufficiently 
high 3D score were used as templates to construct 3D models 
of ORF3 using the MODELLER program [4] . The quality of 
3D models was evaluated by ProQ program [5] and the best 
model was used for further analyses. Specifically, in order to 
get possible information about the function of ORF3, VAST 
(http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch. 
html), DALI (http://www.ebi.ac.uk/dali/) and CE [6] pro¬ 
grams were employed to search the structure neighbors of 
ORF3 protein. The structural comparison was performed by 
LGA [7]. Finally, NCI program [8] was used to identify non- 
canonical interactions in protein structures. The visualization 
of 3D structure was generated by PROTEINEXPLORER 
(http://www.proteinexplorer.org). 

3. Results and discussion 

The 3D Jury system found three significant hits (3D 
score >90) which have a similar fold to ORF3 (threading 
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Table 1 

The sequence alignment between ORF3 and 1LVL and the secondary structure 


EEEEEEEEEEE HHHHHHHH HHHHHHHH EE 
0RF3 MRFFTLRSITAQPVKIDNASPASTVHATATIPLQASLPFGWLVIGV—AFLAVFQSATKI 

1LVL -QQTIQTTLLI-IGGGPGGYVAAIRAG-QLGIPT 

EEE EEHHHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHH 

0RF3 IALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVA-AGMEAQFLYL 

1LVL VLV—EGQALGGTCLNIGCIPSKALIHVAEQFHQASRFTEPSPLGISVASPRLDIGQSVA 

HHHHHHHHHHHHHHHHHHHHHEEE EEEE EEEEE E EEE 

0RF3 YALIYFLQCINACRIIMRCWLCWKCKSKNPLLYDANYFVCW-HTHNYDYCIPYNS—VTD 

1LVL WKDGIVDRLTTGVAALLKKHGVKVVHGWAKVL-DGKQVEVDGQRIQCEHLLLATGSSSVE 

EEEEE HHH EEEEEEEEEEEEEEEE EEE 

0RF3 TIVVTEGDGISTPKLKEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGI 

1LVL LPRRPRTKGFNLECLDLKMNGAAIAIDERCQTSMHNVWAIGD—VAGEPMLAHRAMAQG- 

HHHHHHHHHH 

0RF3 ENATFFIFNKLVK-DPPNVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPL 

1L VL EMVAE11AGKARRFE- 


server PCONS2): 1LVL (Pseudomonas putida lipoamide 
dehydrogenase, 3D score 102), 3GRS (human glutathione 
reductase, 3D score 99), 1GES_A (Escherichia coli 
glutathione reductase, 3D score 95.5). Using 1LVL, 3GRS 
and 1GES_A as templates the corresponding 3D models for 
ORF3 were generated and the quality of protein model was 
evaluated by the ProQ program with two measurements. 
The results are listed below: 1LVL (ProQ-LG = 2.826, 
ProQ-MX = 0.249), 3GRS (ProQ-LG= 1.498, ProQ-MX = 
0.146), 1GES (ProQ-LG = 1.601, ProQ-MX = 0.14). This 
means all 3D models of ORF3 based on above templates are 
‘correct model’, 1LVL is almost ‘good model’ (the cutoff 
value of two protein quality measurements is: ProQ-LG > 
1.5 or ProQ-MX >0.1 for correct model, and ProQ-LG >3 
or ProQ-MX >0.5 for good model). So we take the 3D 
model built on template 1LVL as the 3D model of ORF3. 
The alignment between ORF3 and 1LVL and secondary 



Fig. 1. The predicted 3D model of SARS-CoV ORF3 based on template 
1LVL (. Pseudomonas putida lipoamide dehydrogenase), which consists of 
5 a helices and 6 (3 sheets. 


structure are shown in Table 1. Fig. 1 shows the 3D model of 
ORF3 built on template 1LVL, which consists of 5 a helices 
and 6 (3 sheets. Three trans-membrane regions (34-56, 
77-99, and 103-125) predicted by Marra et al. [1] 
correspond to three helices regions in the alignment: 
43-58, 71-98, and 103-129. The additional two helices 
regions are also existed in the structure: 3-12 and 138-147. 

Naturally, a question arises: what information about 
ORF3’s function we can get from its 3D structure? The 
above three templates (1LVL, 3GRS and 1GES), which 
belong a FAD/NAD-linked reductase family, lead us to the 
speculation that ORF3 may be a protein related to 
FAD/NAD-binding. This seems consistent with the 
speculation that ORF3 may encode a protein related to 



Fig. 2. The superposition of SARS-CoV ORF3 (white) with other 
dehydrogenases: lOJT (lipoamide dehydrogenase from bacterium 
Neisseria meningitides, blue), 1JEH (lipoamide dehydrogenase from 
yeast Saccharomyces cerevisiae, gray), 1LPF (lipoamide dehydrogenase 
from bacterium Pseudomonas fluorescens, green), 1EBD (dehydrolipoa- 
mide dehydrogenase from bacterium Bacillus stearothermophilus, red), and 
1DXL (lipoamide dehydrogenase from pea Pisum sativum, yellow). 
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ATP-binding [1], because FAD/NAD is generally used as an 
oxidant to yield ATP [9] . 

In order to collect as many evidence as possible for such 
a speculation, we employed VAST, DALI and CE to search 
the structure neighbors of ORF3. We found that the top hits, 
except the above templates, focus on the following 
dehydrogenases (also belong to the FAD/NAD-linked 
reductase family): 10JT (lipoamide dehydrogenase from 
bacterium Neisseria meningitides ), 1JEH (lipoamide dehy¬ 
drogenase from yeast Saccharomyces cerevisiae ), 1LPF 
(lipoamide dehydrogenase from bacterium Pseudomonas 


fluorescens ), 1EBD (dehydrolipoamide dehydrogenase 
from bacterium Bacillus stearothermophilus ), and 1DXL 
(lipoamide dehydrogenase from pea Pisum sativum). The 
superpositions between these enzymes and ORF3 are shown 
in Fig. 2. It can be seen from the revised structure alignment 
(Fig. 3) that there are a number of similar structural patterns 
showing conservative residues (bold representation). In 
particular, among them are partially conserved FAD-bind 
motif (marked as ‘#’) and NAD-binding motif (marked as 
‘$’) [10-14]. This suggests that there could be a link 
between ORF3 and FAD/NAD-binding protein indeed. 
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317 
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###### 

ASTVHATATIPLQASLPFGWLVIG--VAFLAVFQSATKIIALNKR-WQLALYK-- 50 

SADAEYDWVL-GGGPGGYSAAFAAADE - GLKVAIVERY-KTLGGVCLN 4 7 

TINKSHDVVII-GGGPAGYVAAIKAA-QLGFNTACVEKR-GKLGGTC- - 4 4 

--SQKFDVWI-GAGPGGYVAAIRAA- QLGLKTACIEKYIGKEGKVALGGTCLN 5 0 

- -AIETETLW-GAGPGGYVAAI RAA-QLGQKVTIVEKG-NLGGVCL- 42 

SGSDENDWII-GGGPGGYVAAI KAA- QLGFKTTCIEKR-GALGGTCLN 4 7 


G—FQFICNL-LLLFVTIYS HLLL-V-AA-GMEAQFLY 82 

V -GCI PSKALLHNAAVIDEVRH-LAANGI-KYPEPELDIDMLR 87 

LNV-GC I PS KALLNNSHLFHQM - HTEAQKRG-IDVNGDI-KINVANFQ 8 8 

V -GCIPSKALLDSSYKY-HEAKEAFKVHGIEAK-GV-TIDVPAMV 91 

NV-GCIPSKALISASHRYEQAKH-SEEMGIKAENV-TIDFAKVQ 83 

V -GC I PS KALLHS SHMYHEAKH-S-FANHGVKVSNVEIDLAAMM 8 8 

LYALIYF- -LQCINACRIIMRCWLCWKCKSKNPLLYDANYFVC-WH 126 

AYKDGVVSRLTGGLAGMAKSRKVDVIQGDGQFL--DPHHLEVSLTAGDAYEQAAPTGEKK 146 

KAKDDAVKQLTGGIELLFKKNKVTYYKGNGSFED--ETKIRVTPVDGL-EGTVKEDH 143 

ARKANIVKNLTGGIATLFKANGVTSFEGHGKLLAN--KQVEVTGL-DGKTQ 140 

EWKASVVKKLTGGVEGLLKGNKVEIVKGEAYFVD - -ANTVRWNG-DSAQ 131 

GQKDKAVSNLTRGIEGLFKKNKVTYVKGYGKFV- -SPSEISVDTI-EGENT 137 

$$$$$$$ 

THNYDYCIPY - NSVTDTTWTE-GDGISTPKLKE-14 7 

IVAFKNCIIAAGSRVTKLPFIPEDPRIIDSSGALALKEVPGKLLI-IGGGIIGLEMGTVY 206 
ILDVKNIIVATGSEVTPFPGIEIDEEKIVSSTGALSLKEIPKRLTIIGGGIIGLEMGSVY 203 
VLEAENVIIASGSRPVEIPPAPLSDDIIVDSTGALEFQAVPKKLGVIGAGVIGLELGSVW 200 
TYTFKNAIIATGSRPIELPNFKFSNRILDSTGALNLGEVPKSLVV-IGGGYIGIELGTAY 191 
VVKGKHIIIATGSDVKSLPGVTIDEKKIVSSTGALALSEIPKKLVVIGAGYIGLEMGSVW 197 


STLGSRLDWEMMDGLMQGADRDL-VKVWQKQNEYRFDNIMVNTKTVAVEPKEDGVYV 2 62 

SRLGSKVTWEFQPQIGASMDGEVAKATQKFLKKQGLDFKLSTKVISAKRNDDKNWEIV 2 63 

ARLGAEVTVLEALDKFLPAADEQ-IAKEALKVLTKQGLNIRLGARVTASEVKKKQVT 256 

ANFGTKVTILEGAGEILSGFEKQM-AAIIKKRLKKKGVEVVTNALAKGAEEREDGVT 246 

GRIGSEVTWEFASEIVPTMDAEI--RKQFQRSLEKQGMKFKLKTKVVGVDTSGDGVKLT 255 

-DYQIGGYSEDRHSGVKDYVVVH 180 

TFEGANAPKEPQRYDAVLVAAGRAPNGKLISAEKAGVAVTDRGFIEVDKQMRTNVPHIYA 322 
VEDTKTNKQENLEAEVLLVAVGRRPYIAGLGAEKIGLEVDKRGRLVIDDQFNSKFPHIKV 323 
VTFTDANGEQKETFDKLIVAVGRRPVTTDLLAADSGVTLDERGFIYVDDHCKTSVPGVFA 316 
VTYEANGETKTIDADYVLVTVGRRPNTDELGLEQIGIKMTNRGLIEVDQQCRTSVPNIFA 306 
VEPSAGGEQTIIEADWLVSAGRTPFTSGLNLDKIGVETDKLGRILVNERFSTNVSGVYA 315 


-GYFTEVYYQLESTQITTDTGIENATFFIFNK-- 210 
IG—DIVGQP-MLAH KAVHEGHVAAENCAGHK-- 350 

VGD-VTFGPMLAHKAEEEGI-AAVEMLKTGH- 351 

IG-DVVRGAMLAHKASEEGVMVAERIAGHK-- 344 

IG—DIVPGPA-LAHKASYEGKVAAEAIAGHP-- 334 
IGD VIPGPMLAHKAEEDGV--ACVEYLAGKV 34 3 


Fig. 3. The structure alignment between SARS-CoV ORF3 and other dehydrogenases: lOJT (lipoamide dehydrogenase from bacterium Neisseria 
meningitides), 1JEH (lipoamide dehydrogenase from yeast Saccharomyces cerevisiae), 1LPF (lipoamide dehydrogenase from bacterium Pseudomonas 
fluorescens), 1EBD (dehydrolipoamide dehydrogenase from bacterium Bacillus stearothermophilus), and 1DXL (lipoamide dehydrogenase from pea Pisum 
sativum). The bold representations indicate conservative residues. The FAD-binding motif is marked as ‘#’ and the NAD-binding motif marked as ‘$\ 
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Fig. 4. Non-canonical interactions in the structure of SARS-CoV ORF3. 
The residue pairs involved are (colored blue): Tyrl60 (donor) and Phe43 
(acceptor), Phe43 (donor) and Tyr206 (acceptor). 

Finally, the non-canonical interactions in ORF3 protein 
structure were identified by NCI program and the results 
showed that there are three pairs of main chain-side chain 
interactions: Tyrl60 (donor) and Phe43 (acceptor), Phe43 
(donor) and Tyr206 (acceptor), and Ile232 (donor) and 
Phe231 (acceptor). Among these interactions, Phe43 forms 
two N-FF--TU bonds in a sandwich fashion: one donates to 
Tyr206, and one donated by Tyrl60, as existed in human 
racl [15] and SARS-CoV main protease [16]. These non- 
canonical bindings fix the big helix Phe43 locate to the two 
loops Tyrl60 and Tyr206 locate, hence stabilizes the 
structure of ORF3 protein (Fig. 4). These results can be 
used for rational design of mutagenesis experiments and 
analysis of conservation of interactions at functional sites. 
In recent years, the non-canonical interactions have been 
shown to be important for the stability of protein structure 
[17-19] and ligand recognition [20]. 
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